Efficient Policies for Stationary Possibilistic Markov Decision Processes
نویسندگان
چکیده
Possibilistic Markov Decision Processes offer a compact and tractable way to represent and solve problems of sequential decision under qualitative uncertainty. Even though appealing for its ability to handle qualitative problems, this model suffers from the drowning effect that is inherent to possibilistic decision theory. The present paper proposes to escape the drowning effect by extending to stationary possibilistic MDPs the lexicographic preference relations defined in [6] for non-sequential decision problems and provides a value iteration algorithm to compute policies that are optimal for these new criteria.
منابع مشابه
On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
We consider infinite-horizon stationary γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. Using Value and Policy Iteration with some error ǫ at each iteration, it is well-known that one can compute stationary policies that are 2γ (1−γ)2 ǫ-optimal. After arguing that this guarantee is tight, we develop variations of Value and Policy Iter...
متن کاملSplitting Randomized Stationary Policies in Total-Reward Markov Decision Processes
This paper studies a discrete-time total-reward Markov decision process (MDP) with a given initial state distribution. A (randomized) stationary policy can be split on a given set of states if the occupancy measure of this policy can be expressed as a convex combination of the occupancy measures of stationary policies, each selecting deterministic actions on the given set and coinciding with th...
متن کاملFinite-Horizon Markov Decision Processes with State Constraints
Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize costs) in a given stochastic dynamical environment. In many practical scenarios (multi-agent systems, telecommunication, queuing, etc.), the decision-making probl...
متن کاملGeometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes
It is well known that any finite state Markov decision process (MDP) has a deterministic memoryless policy that maximizes the discounted longterm expected reward. Hence for such MDPs the optimal control problem can be solved over the set of memoryless deterministic policies. In the case of partially observable Markov decision processes (POMDPs), where there is uncertainty about the world state,...
متن کاملFirst results for a mathematical theory of possibilistic Markov processes
We provide basic results for the development of a theory of possibilistic Markov processes. We define and study possibilistic Markov processes and possibilistic Markov chains, and derive a possibilistic analogon of the Chapman-Kolmogorov equation. We also show how possibilistic Markov processes can be constructed using one-step transition possibilities.
متن کامل